Skip to content

perf(autoware_tensorrt_plugins): keep SegmentCSR allocation-free#12555

Merged
mojomex merged 2 commits into
autowarefoundation:mainfrom
mojomex:perf/trt-plugins-no-segmentcsr-allocs
Jun 15, 2026
Merged

perf(autoware_tensorrt_plugins): keep SegmentCSR allocation-free#12555
mojomex merged 2 commits into
autowarefoundation:mainfrom
mojomex:perf/trt-plugins-no-segmentcsr-allocs

Conversation

@mojomex

@mojomex mojomex commented May 7, 2026

Copy link
Copy Markdown
Contributor

Summary

Fourth PR in the SegmentCSR split stack.

After the test, contract-cleanup, and n-dimensional indptr removal commits, this final commit keeps segment_csr_launch allocation-free by filling the output buffer directly instead of allocating, filling, copying, and freeing a scratch base buffer on every launch.

Stack

  1. test(autoware_tensorrt_plugins): add SegmentCSR kernel tests #12740: add SegmentCSR kernel tests.
  2. refactor(autoware_tensorrt_plugins): clarify SegmentCSR contract #12741: document/clarify the contract and clean up in/out API names.
  3. refactor(autoware_tensorrt_plugins): remove SegmentCSR nD indptr path #12739: remove the dead n-dimensional indptr path.
  4. This PR: keep SegmentCSR allocation-free.

Benchmark context

This keeps the original #12555 optimization isolated. The prior measurements showed an isolated SegmentCSR-path improvement of about 2.7% on top of #12554 for PTv3-T18, with standalone kernel microbenchmarks showing the per-call allocation removal saving roughly 14 microseconds.

@github-actions

github-actions Bot commented May 7, 2026

Copy link
Copy Markdown

Thank you for contributing to the Autoware project!

🚧 If your pull request is in progress, switch it to draft mode.

Please ensure:

@github-actions github-actions Bot added type:documentation Creating or refining documentation. (auto-assigned) component:perception Advanced sensor data processing and environment understanding. (auto-assigned) component:sensing Data acquisition from sensors, drivers, preprocessing. (auto-assigned) component:planning Route planning, decision-making, and navigation. (auto-assigned) component:control Vehicle control algorithms and mechanisms. (auto-assigned) component:system System design and integration. (auto-assigned) component:vehicle Vehicle-specific implementations, drivers, packages. (auto-assigned) type:ci Continuous Integration (CI) processes and testing. (auto-assigned) component:common Common packages from the autoware-common repository. (auto-assigned) component:simulation Virtual environment setups and simulations. (auto-assigned) component:evaluator Evaluation tools for planning, localization etc. (auto-assigned) labels May 7, 2026
@mojomex mojomex force-pushed the perf/trt-plugins-no-segmentcsr-allocs branch from c90d878 to 971643a Compare May 7, 2026 14:24
@github-actions github-actions Bot removed type:documentation Creating or refining documentation. (auto-assigned) component:sensing Data acquisition from sensors, drivers, preprocessing. (auto-assigned) component:planning Route planning, decision-making, and navigation. (auto-assigned) component:control Vehicle control algorithms and mechanisms. (auto-assigned) component:system System design and integration. (auto-assigned) component:vehicle Vehicle-specific implementations, drivers, packages. (auto-assigned) type:ci Continuous Integration (CI) processes and testing. (auto-assigned) component:common Common packages from the autoware-common repository. (auto-assigned) component:simulation Virtual environment setups and simulations. (auto-assigned) component:evaluator Evaluation tools for planning, localization etc. (auto-assigned) labels May 7, 2026
@mojomex mojomex force-pushed the perf/trt-plugins-no-segmentcsr-allocs branch from 9057325 to d416c7b Compare May 29, 2026 05:49
@mojomex

mojomex commented Jun 2, 2026

Copy link
Copy Markdown
Contributor Author

Tested the following:

source /opt/ros/humble.bash
colcon build --symlink-install --mixin rel-with-deb-info compile-commands --packages-up-to autoware_ptv3
colcon test --packages-select autoware_tensorrt_plugins --event-handlers console_cohesion+
colcon test-result --verbose

Then launched ptv3 with a custom rosbag that already contains a concat pointcloud, and visually confirmed correctness.

segmentcsr.webm

@manato manato left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mojomex
Thank you for your contribution! I left a comment regarding the index calculation where I was not confident (though I might not fully understand the code flow). I'd appreciate it if you could take a look at it 🙏

Comment thread perception/autoware_tensorrt_plugins/src/scatter_ops/segment_csr.cu Outdated
@github-actions github-actions Bot added the type:documentation Creating or refining documentation. (auto-assigned) label Jun 9, 2026
@mojomex mojomex force-pushed the perf/trt-plugins-no-segmentcsr-allocs branch from b309100 to 2f6c56c Compare June 9, 2026 12:59
@mojomex mojomex requested a review from manato June 9, 2026 13:11
@mojomex mojomex force-pushed the perf/trt-plugins-no-segmentcsr-allocs branch from 2f6c56c to 35ae21a Compare June 10, 2026 08:37
@mojomex mojomex marked this pull request as draft June 10, 2026 08:38
@mojomex mojomex force-pushed the perf/trt-plugins-no-segmentcsr-allocs branch 2 times, most recently from 74067dd to a95664c Compare June 10, 2026 08:52
@mojomex mojomex force-pushed the perf/trt-plugins-no-segmentcsr-allocs branch from a95664c to 88d7c8a Compare June 10, 2026 09:25
@mojomex mojomex force-pushed the perf/trt-plugins-no-segmentcsr-allocs branch from 607313b to 587b46f Compare June 12, 2026 10:40
@github-actions github-actions Bot removed the type:documentation Creating or refining documentation. (auto-assigned) label Jun 12, 2026
@mojomex mojomex marked this pull request as ready for review June 15, 2026 05:45
@mojomex mojomex force-pushed the perf/trt-plugins-no-segmentcsr-allocs branch from 587b46f to d6e3535 Compare June 15, 2026 05:45
Initialize the SegmentCSR output buffer directly instead of allocating, filling, copying, and freeing a scratch base buffer on every launch.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Max SCHMELLER <max.schmeller@tier4.jp>
@mojomex mojomex force-pushed the perf/trt-plugins-no-segmentcsr-allocs branch from d6e3535 to 2ff93e3 Compare June 15, 2026 05:46

@manato manato left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mojomex
LGTM! Thank you for your series of contributions to improving TRT plugins!

@mojomex mojomex merged commit 233e988 into autowarefoundation:main Jun 15, 2026
30 checks passed
@github-project-automation github-project-automation Bot moved this from To Triage to Done in Software Working Group Jun 15, 2026
@mojomex mojomex deleted the perf/trt-plugins-no-segmentcsr-allocs branch June 15, 2026 13:39
tier4-autoware-public-bot Bot pushed a commit to tier4/autoware_universe_perception that referenced this pull request Jun 15, 2026
…owarefoundation/autoware_universe#12555)

Initialize the SegmentCSR output buffer directly instead of allocating, filling, copying, and freeing a scratch base buffer on every launch.

Signed-off-by: Max SCHMELLER <max.schmeller@tier4.jp>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

component:perception Advanced sensor data processing and environment understanding. (auto-assigned) run:build-and-test-differential Mark to enable build-and-test-differential workflow. (used-by-ci)

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

2 participants